Longitudinal Speaker Clustering and Verification Corpus with Code-Switching Frisian-Dutch Speech
نویسندگان
چکیده
In this paper, we present a new longitudinal and bilingual broadcast database designed for speaker clustering and textindependent verification research. The broadcast data is extracted from the archives of Omrop Fryslân which is the regional broadcaster in the province of Fryslân, located in the north of the Netherlands. Two speaker verification tasks are provided in a standard enrollment-test setting with language consistent trials. The first task contains target trials from all speakers available appearing in at least two different programs, while the second task contains target trials from a subgroup of speakers appearing in programs recorded in multiple years. The second task is designed to investigate the effects of ageing on the accuracy of speaker verification systems. This database also contains unlabeled spoken segments from different radio programs for speaker clustering research. We provide the output of an existing speaker diarization system for baseline verification experiments. Finally, we present the baseline speaker verification results using the Kaldi GMMand DNN-UBM speaker verification system. This database will be an extension to the recently presented open source Frisian data collection and it is publicly available for research purposes.
منابع مشابه
Open Source Speech and Language Resources for Frisian
In this paper, we present several open source speech and language resources for the under-resourced Frisian language. Frisian is mostly spoken in the province of Fryslân which is located in the north of the Netherlands. The native speakers of Frisian are Frisian-Dutch bilingual and often code-switch in daily conversations. The resources presented in this paper include a code-switching speech da...
متن کاملA Longitudinal Bilingual Frisian-Dutch Radio Broadcast Database Designed for Code-Switching Research
We present a new speech database containing 18.5 hours of annotated radio broadcasts in the Frisian language. Frisian is mostly spoken in the province Fryslân and it is the second official language of the Netherlands. The recordings are collected from the archives of Omrop Fryslân, the regional public broadcaster of the province Fryslân. The database covers almost a 50-year time span. The nativ...
متن کاملInvestigating Bilingual Deep Neural Networks for Automatic Recognition of Code-switching Frisian Speech
In this paper, a code-switching automatic speech recognition (ASR) system built for the Frisian language is described. Frisian is mostly spoken in the province Fryslân which is located in the north of the Netherlands. The native speakers of Frisian are mostly bilingual and often code-switch in daily conversations due to the extensive influence of the Dutch language. In the scope of the FAME! Pr...
متن کاملAge of acquisition and naming performance in Frisian-Dutch bilingual speakers with dementia
Age of acquisition (AoA) of words is a recognised variable affecting language processing in speakers with and without language disorders. For bi- and multilingual speakers their languages can be differentially affected in neurological illness. Study of language loss in bilingual speakers with dementia has been relatively neglected. Objective We investigated whether AoA of words was associated...
متن کاملNoise Clustering-Based Speaker Verification
The normalisation method for speaker verification proposed in this paper is based on the idea of the noise clustering method in fuzzy clustering. The proposed method can reduce false acceptance errors and apply to all current normalisation scores. Experiments performed on the ANDOSL and YOHO speech corpora show better results for the proposed method.
متن کامل